Sequential Modeling for Identifying Gene Locations in Human Genome

نویسندگان

  • Nilanjan Dasgupta
  • Lawrence Carin
چکیده

We consider several sequential processing algorithms for identifying genes in human DNA, based on detecting CpG islands. The algorithms are designed to capture the underlying statistical structure in a DNA sequence. Sequential processing using a Markov model and a hidden Markov model are shown to identify most CpG islands in annotated (marked) DNA subsequences in publicly available DNA data sets. We also consider a wavelet-based hidden Markov tree (HMT). In the context of the HMT, we address design of adaptive wavelets matched to CpG islands, this effected via lifting and geneticalgorithm optimization. DNA is comprised of a sequence of units called nucleotides [1]: adenine (A), cytosine (C), guanine (G), and thyamine (T). In the human genome, a C nucleotide is generally modified chemically by methylation if followed by a G. Methyl-C mutates into a T with a high probability. The methylation process is suppressed in localized segments of the genome, often at the “start” regions of many genes. These regions, characterized by a higher concentration of C-G dinucleotides than elsewhere, are called CpG islands (“C precedes G”). Our objective is to produce models for distinguishing variable-length CpG islands from the rest of the DNA sequence. We consider the following algorithms: a

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-49: Human Y Chromosome ProteomeProject

The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...

متن کامل

Run of Homozygosity a Procedure to Detecting Inbreeding in Farm Animals

Inbreeding depression is a harmful phenomenon in livestock which is outcome of inbreeding. Inbreeding is consequence mating between two individuals who are more related to each other than average relatedness in population, which results in reducing in fitness of progenies and genetic variability in populations. Development of high-density genome-wide single nucleotide polymorphism (SNP) array f...

متن کامل

Computational prediction of miRNAs in Nipah virus genome reveals possible interaction with human genes involved in encephalitis

Current re-emergence of Nipah virus (NiV) in India caused 11 deaths so far and many patients were kept in quarantine. A thorough study of previous outbreaks occurred in Malaysia, Bangladesh and India represents cases with high rate of fatality due to acute encephalitis. Our work involves genome analysis of NiV for prediction of miRNAs and their targeted genes in human in order to understand enc...

متن کامل

Molecular detection of proteolytic activity of human parechovirus 2A protein by gene expression

  Parechoviruses form one of the nine genera in the picornaviridae family, and include two human pathogens: Human parechovirus type1 and 2 (Hpev1 and Hpev2). The genome of picornaviruses encodes a single polyprotein, which undergoes a cleavage cascade performed by virus encoded proteases to give the final virus proteins. The primary cleavage occurs by 2A protein and this step is critical for vi...

متن کامل

Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis

Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001